The parallel language FORK [1], based on a scalable shared memory model, is a PASCAL-like language with some additional parallel constructs. A PRAM (Parallel Random Access Machine) algorithm can be expressed on a high level of abstraction as a FORK program which is translated into efficient PRAM code guaranteeing theoretically predicted runtimes.
In this paper, we concentrate on those features of the language FORK related to parallelism, such as the group concept, a shared memory access and synchronous or asynchronous execution. We present a trace-based denotational interleaving semantics where processes describe synchronous computations. Processes are created or deleted dynamically and run asynchronously. Interleaving rules reflect the underlying CRCW (concurrent-read-concurrent-write) PRAM model. 相似文献
A graph is distance-hereditary if the distance stays the same between any of two vertices in every connected induced subgraph containing both. Two well-known
classes of graphs, trees and cographs, both belong to distance-hereditary graphs. In this paper, we first show that the perfect
domination problem can be solved in sequential linear-time on distance-hereditary graphs. By sketching some regular property
of the problem, we also show that it can be easily parallelized on distance-hereditary graphs. 相似文献
Consider the problem of efficiently simulating the shared-memory parallel random access machine (PRAM) model on massively parallel architectures with physically distributed memory. To prevent network congestion and memory bank contention, it may be advantageous to hash the shared memory address space. The decision on whether or not to use hashing depends on (1) the communication latency in the network and (2) the locality of memory accesses in the algorithm.We relate this decision directly to algorithmic issues by studying the complexity of hashing in the Block PRAM model of Aggarwal, Chandra, and Snir, a shared-memory model of parallel computation which accounts for communication locality. For this model, we exhibit a universal family of hash functions having optimal locality. The complexity of applying these hash functions to the shared address space of the Block PRAM (i.e., by permuting data elements) is asymptotically equivalent to the complexity of performing a square matrix transpose, and this result is best possible for all pairwise independent universal hash families. These complexity bounds provide theoretical evidence that hashing and randomized routing need not destroy communication locality, addressing an open question of Valiant.This work was started when the author was a student at Oxford University, supported by a National Science Foundation Graduate Fellowship and a Rhodes Scholarship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the National Science Foundation or the Rhodes Trust. 相似文献
Given a set S of n proper circular arcs, it is required to identify a largest cardinality subset K[S] of S each two of whose members intersect. This paper describes an optimal parallel algorithm to compute K[S]. The algorithm is not based on any previously known sequential solution, and is designed for the CREW PRAM model of computation. It uses 0(n/logn) processors and runs in O(logn) time. An interesting feature of the algorithm is that it transforms the computational geometric problem at hand, to a problem involving computations on 0-1 matrices, and then transforms the latter back into a ray shooting problem in computational geometry. 相似文献
Given an array ofn input numbers, therange-maxima problem is that of preprocessing the data so that queries of the type what is the maximum value in subarray [i..j] can be answered quickly using one processor. We present a randomized preprocessing algorithm that runs inO(log*n) time with high probability, using an optimal number of processors on a CRCW PRAM; each query can be processed in constant time by one processor. We also present a randomized algorithm for a parallel comparison model. Using an optimal number of processors, the preprocessing algorithm runs inO( (n)) time with high probability; each query can be processed inO ( (n)) time by one processor. (As is standard, (n) is the inverse of Ackermann function.) A constant time query can be achieved by some slowdown in the performance of the preprocessing stage. 相似文献
We present a randomized EREW PRAM algorithm to find a minimum spanning forest in a weighted undirected graph. On an n -vertex graph the algorithm runs in o(( log n)1+ɛ) expected time for any ɛ >0 and performs linear expected work. This is the first linear-work, polylog-time algorithm on the EREW PRAM for this problem.
This also gives parallel algorithms that perform expected linear work on two general-purpose models of parallel computation—the
QSM and the BSP. 相似文献
Traditionally, the block-based medial axis transform (BB-MAT) and the chessboard distance transform (CDT) were usually viewed as two completely different image computation problems, especially for three dimensional (3D) space. In fact, there exist some equivalent properties between them. The relationship between both of them is first derived and proved in this paper. One of the significant properties is that CDT for 3D binary image V is equal to BB-MAT for image V' where it denotes the inverse image of V. In a parallel algorithm, a cost is defined as the product of the time complexity and the number of processors used. The main contribution of this work is to reduce the costs of 3D BB-MAT and 3D CDT problems proposed by Wang [65]. Based on the reverse-dominance technique which is redefined from dominance concept, we achieve the computation of the 3D CDT problem by implementing the 3D BB-MAT algorithm first. For a 3D binary image of size N3, our parallel algorithm can be run in O(logN) time using N3 processors on the concurrent read exclusive write (CREW) parallel random access machine (PRAM) model to solve both 3D BB-MAT and 3D CDT problems, respectively. The presented results for the cost are reduced in comparison with those of Wang's. To the best of our knowledge, this work is the lowest costs for the 3D BB-MAT and 3D CDT algorithms known. In parallel algorithms, the running time can be divided into computation time and communication time. The experimental results of the running, communication and computation times for the different problem sizes are implemented in an HP Superdome with SMP/CC-NUMA (symmetric multiprocessor/cache coherent non-uniform memory access) architecture. We conclude that the parallel computer (i.e., SMP/CC-NUMA architecture or cluster system) is more suitable for solving problems with a large amount of input size. 相似文献
Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting)
intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum
cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the
general case of the problem, our algorithms compute a maximum matching in O( log 3 n) time using O(n/ log2 n) processors on the EREW PRAM and using n processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O( log n ) time using O(n) processors if the input intervals are not given already sorted and using O(n/ log n ) processors otherwise, on the EREW PRAM. On n -processor hypercubes, our algorithm for the proper interval case takes O( log n log log n ) time for unsorted input and O( log n ) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings
among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping
intervals in proper interval graphs.
Received November 20, 1995; revised September 3, 1998. 相似文献
Spatial regularity amidst a seemingly chaotic image is often meaningful. Many papers in computational geometry are concerned with detecting some type of regularity via exact solutions to problems in geometric pattern recognition. However, real-world applications often have data that is approximate, and may rely on calculations that are approximate. Thus, it is useful to develop solutions that have an error tolerance.
A solution has recently been presented by Robins et al. [Inform. Process. Lett. 69 (1999) 189–195] to the problem of finding all maximal subsets of an input set in the Euclidean plane
that are approximately equally-spaced and approximately collinear. This is a problem that arises in computer vision, military applications, and other areas. The algorithm of Robins et al. is different in several important respects from the optimal algorithm given by Kahng and Robins [Patter Recognition Lett. 12 (1991) 757–764] for the exact version of the problem. The algorithm of Robins et al. seems inherently sequential and runs in O(n5/2) time, where n is the size of the input set. In this paper, we give parallel solutions to this problem. 相似文献
A number of highly-threaded, many-core architectures hide memory-access latency by low-overhead context switching among a large number of threads. The speedup of a program on these machines depends on how well the latency is hidden. If the number of threads were infinite, theoretically, these machines could provide the performance predicted by the PRAM analysis of these programs. However, the number of threads per processor is not infinite, and is constrained by both hardware and algorithmic limits. In this paper, we introduce the Threaded Many-core Memory (TMM) model which is meant to capture the important characteristics of these highly-threaded, many-core machines. Since we model some important machine parameters of these machines, we expect analysis under this model to provide a more fine-grained and accurate performance prediction than the PRAM analysis. We analyze 4 algorithms for the classic all pairs shortest paths problem under this model. We find that even when two algorithms have the same PRAM performance, our model predicts different performance for some settings of machine parameters. For example, for dense graphs, the dynamic programming algorithm and Johnson’s algorithm have the same performance in the PRAM model. However, our model predicts different performance for large enough memory-access latency and validates the intuition that the dynamic programming algorithm performs better on these machines. We validate several predictions made by our model using empirical measurements on an instantiation of a highly-threaded, many-core machine, namely the NVIDIA GTX 480. 相似文献